Direct Modeling of Spoken Passwords for Text-dependent Speaker Recognition by Compressed Time-feature Representations
نویسندگان
چکیده
Traditional Text-Dependent Speaker Recognition (TDSR) systems model the user-specific spoken passwords with frame-based features such as MFCC and use DTW or HMM type classifiers to handle the variable length of the feature vector sequence. In this paper, we explore a direct modeling of the entire spoken password by a fixed-dimension vector called Compressed Feature Dynamics or CFD. Instead of the usual frame-by-frame feature extraction, the entire password utterance is first modeled by a 2-D Featurogram or FGRAM, which efficiently captures speaker-identityspecific speech dynamics. CFDs are compressed and approximated version of the FGRAMs and their fixed dimension allows the use of simpler classifiers. Overall, the proposed FGRAM-CFD framework provides an efficient and direct model to capture the speaker-identity information well for a TDSR system. As demonstrated in trials on a 344speaker database, compared to traditional MFCC-based TDSR systems, the FGRAM-CFD framework shows quite encouraging performance at significantly lower complexity.
منابع مشابه
Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech
Prevalent speaker recognition methods use only spectralenvelope based features such as MFCC, ignoring the rich speaker identity information contained in the temporalspectral dynamics of the entire speech signal. We propose a new feature called compressed spectral dynamics or CSD for speaker recognition based on a compressed time-frequency representations of spoken passwords which effectively ca...
متن کاملOn the Influence of Text Content on Pass-Phrase Strength for Short-Duration Text-Dependent Automatic Speaker Authentication
In the context of automatic speaker verification it is well known that different speech units offer different levels of speaker discrimination. For short-duration, text-dependent automatic speaker recognition, a user’s pass-phrase bears influence on how reliably they can be recognized; just as is the case with text passwords, some spoken pass-phrases are more secure than others. This paper inve...
متن کاملSpeech Recognition as Feature Extraction for Speaker Recognition
Information from speech recognition can be used in various ways in state-of-the-art speaker recognition systems. This includes the obvious use of recognized words to enable the use of text-dependent speaker modeling techniques when the words spoken are not given. Furthermore, it has been shown that the choice of words and phones itself can be a useful indicator of speaker identity. Also, recogn...
متن کاملAssamese Vowel Phoneme Recognition Using Zero Crossing Rate and Short-time Energy
Speaker recognition is the identification of the person who is speaking by the characteristics of their voices. Assamese is a Indo-Aryan family of languages, mainly spoken in the North-Eastern of India. In this paper text dependent speaker modelling technique is used. The system contains training phase, the testing phase and the recognition phase. The database consists of utterance of 10 speake...
متن کاملText independent speaker recognition using speaker dependent word spotting
This paper is motivated by the fact that text dependent speaker recognition is inherently more accurate than text independent speaker recognition. In this work we assign models to frequent words spoken by a speaker and spot them in a test call. In this way, text-dependent speaker recognition technology can be used for text independent tasks. The approach we take is to use DTW (Dynamic Time Warp...
متن کامل